AI inference on the edge

Fast, lightweight, portable, rust-powered and OpenAI compatible

Powered by WasmEdge.

Applications

LLM inference
Rust+Wasm is the tech stack for LLM applications everywhere.
Lightweight. Total runtime size is 30MB as opposed 4GB for Python and 350MB for Ollama.
Fast. Full native speed on GPUs.
Portable. Single cross-platform binary on different CPUs, GPUs and OSes.
Secure. Sandboxed and isolated execution on untrusted devices.
Modern languages for inference apps.
Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.
OpenAI compatible. Seamlessly integrate into the OpenAI tooling ecosystem.
Learn more | Give it a try
LLM Agent
Flows.network is a serverless platform for building complex data flow applications. Examples include
SaaS workflow automation apps
Streaming data analytics
Real-time AI processing
Quantitative and automated trading apps
R&D process automation
DevRel and community management
Cloud-native microservices
We work with cloud providers, especially edge cloud / CDN compute providers, to support microservices for web apps. Use cases include AI inference, database access, CRM, e-commerce, workflow management, and server-side rendering.
Data analytics
We work with streaming frameworks and databases to support embedded serverless functions for data filtering and analytics. The serverless functions could be database UDFs. They could also be embedded in data ingest or query result streams.

Try it out

> wasmedge --dir .:. --nn-preload default:GGML:AUTO:model_name.gguf llama-chat.wasm
Run Llama 2 inference on your own device

Zero python dependency! Take full advantage of the GPUs. Write once, run anywhere. Get started with Llama 2 series of models on your own device in 5 minutes.
Rust

Build a RAG-based LLM agent

Retrieval-argumented fmgeneration (RAG) is a very popular approach to build AI agents with external knowledge bases. Create your own in flows.network.
Rust | Example: Learn Rust

Edge AI service

Create an HTTP microservice for image classification. It runs YOLO and Mediapipe models at native GPU speed.

Get in touch

Open Source Repositories

WasmEdge

A cloud-native and edge-native WebAssembly Runtime

WasmEdge/WasmEdge
6990

dapr-wasm

A WebAssembly runtime for dapr microservices.

second-state/dapr-wasm
249

wasmedge-quickjs

A node.js compatible JavaScript runtime for WasmEdge

second-state/wasmedge-quickjs
160

wasm-learning

Building Rust functions with WebAssembly

second-state/wasm-learning
454

llama-utils

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

second-state/llama-utils
15
漏2024 Second State Inc., DBA super node LLC